ROP and Memory Architecture
AMD made some big changes to its memory architecture in its RV770 chip and they have been carried over onto Cypress. The memory controller hub still reaches out to four 64-bit memory controllers, which are each coupled with a 128KB L2 texture cache (a doubling in size, compared to the previous generation).
This is partly down to the increased texture throughput available inside Cypress, but also because AMD has doubled the number of ROPs (or render backends) in its new GPU. It's all about effective use of the bandwidth available and, if anything, memory bandwidth is the one area where we initially felt the Radeon HD 5870 fell a little short.
The Radeon HD 4890 featured 1GB of 975MHz (3.9GHz effective) GDDR5 memory on its 256-bit memory interface, resulting in a theoretical peak bandwidth of 124.8GB/sec. By contrast, the Radeon HD 5870 has 1GB of GDDR5 clocked at 1.2GHz (4.8GHz effective) attached to the same memory bus, meaning a theoretical peak memory bandwidth of 153.6GB per second. This is only a 23 per cent rise, where we’ve seen 100 per cent rises in the architecture elsewhere (core count, ROPs, etc.). When questioned, AMD's David Nalasco seemed to have the answers.
Cypress's memory architecture
He claimed that the Radeon HD 4800-series cards using GDDR5 had almost unlimited bandwidth compared to the amount of compute horsepower available and so there were very few scenarios where it ran out of memory bandwidth. In comparison, the HD 5870 is apparently a much more balanced design, where the card is memory bandwidth limited just under 50 per cent of the time and data limited for the remainder.
Epic’s
Tim Sweeney would probably contend this, as he believes that memory bandwidth is the one area of GPU architecture that is a long way behind what's optimal. AMD claimed that his ideal world isn't applicable to graphics though, but we’re not quite so convinced. The truth likely lies somewhere in the middle, and there's nothing wrong with having oodles of memory bandwidth on a £300+ graphics card.
However, to get back onto topic, Nalasco moved on to talk about how AMD has optimised the available memory bandwidth. He said that, where possible, primary data from the cores is allocated to the local memory controller to reduce congestion, but if that particular memory controller is busy, the data is moved onto the next available one as it is still passing through the memory controller hub.
The ROPs haven't escaped attention in the transition from RV770 to Cypress, either. AMD has not only chosen to double them up (meaning two render back ends per 64-bit memory controller, each capable of blending four pixels per clock) but there are some new capabilities as well. There's a new readback path which allows the texture units to read compressed AA colour buffers, thus improving performance with AMD's custom filter anti-aliasing modes.
AMD has also improved performance when using multiple render targets and there's also support for fast colour clears, which seemed strange at first. However, when asked, AMD said that it was added because some software vendors are a little over-zealous when it comes to clearing the screen, and that this was having an impact on performance.
Probably the biggest news is the return of supersampled anti-aliasing, which unfortunately passed us by during the briefing and it's something that we haven't fully explored thus far. Supersampling is a brute-force anti-aliasing technique that takes multiple samples of every pixel on screen, which means image quality is unmatched even by the best multisampled anti-aliasing algorithms (even when combined with transparency AA).
It uses a fairly standard set of sample patterns, although Cypress includes support for shifting sample patterns for optimal image quality. Not surprisingly, it absolutely kills performance because it's a terribly inefficient way of doing things and supersampling on the Radeon HD 5870 is no different.
Want to comment? Please log in.